NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Large Language Model Annotation Bias in Hate Speech Detection

https://doi.org/10.1609/icwsm.v19i1.35879

Okpala, Ebuka; Cheng, Long (June 2025, Proceedings of the International AAAI Conference on Web and Social Media)

Large language models (LLMs) are fast becoming ubiquitous and have shown impressive performance in various natural language processing (NLP) tasks. Annotating data for downstream applications is a resource-intensive task in NLP. Recently, the use of LLMs as a cost-effective data annotator for annotating data used to train other models or as an assistive tool has been explored. Yet, little is known regarding the societal implications of using LLMs for data annotation. In this work, focusing on hate speech detection, we investigate how using LLMs such as GPT-4 and Llama-3 for hate speech detection can lead to different performances for different text dialects and racial bias in online hate detection classifiers. We used LLMs to predict hate speech in seven hate speech datasets and trained classifiers on the LLM annotations of each dataset. Using tweets written in African-American English (AAE) and Standard American English (SAE), we show that classifiers trained on LLM annotations assign tweets written in AAE to negative classes (e.g., hate, offensive, abuse, racism, etc.) at a higher rate than tweets written in SAE and that the classifiers have a higher false positive rate towards AAE tweets. We explore the effect of incorporating dialect priming in the prompting techniques used in prediction, showing that introducing dialect increases the rate at which AAE tweets are assigned to negative classes.
more » « less
Free, publicly-accessible full text available June 7, 2026
Enhancing AI-Centered Social Cybersecurity Education through Learning Platform Design

https://doi.org/10.53735/cisse.v12i1.204

Vishwamitra, Nishant; Okpala, Ebuka; Liao, Song; Guo, Keyan; Shah, Sandeep; Hu, Hongxin; Yuan, Xiaohong; Cheng, Long (April 2025, Journal of The Colloquium for Information Systems Security Education)

Artificial Intelligence (AI) technologies have become increasingly pervasive in our daily lives. Recent breakthroughs such as large language models (LLMs) are being increasingly used globally to enhance their work methods and boost productivity. However, the advent of these technologies has also brought forth new challenges in the critical area of social cybersecurity. While AI has broadened new frontiers in addressing social issues, such as cyberharassment and cyberbullying, it has also worsened existing social issues such as the generation of hateful content, bias, and demographic prejudices. Although the interplay between AI and social cybersecurity has gained much attention from the research community, very few educational materials have been designed to engage students by integrating AI and socially relevant cybersecurity through an interdisciplinary approach. In this paper, we present our newly designed open-learning platform, which can be used to meet the ever-increasing demand for advanced training in the intersection of AI and social cybersecurity. The designed platform, which consists of hands-on labs and education materials, incorporates the latest research results in AI-based social cybersecurity, such as cyberharassment detection, AI bias and prejudice, and adversarial attacks on AI-powered systems, are implemented using Jupyter Notebook, an open-source interactive computing platform for effective hands-on learning. Through a user study of 201 students from two universities, we demonstrate that students have a better understanding of AI-based social cybersecurity issues and mitigation after doing the labs, and they are enthusiastic about learning to use AI algorithms in addressing social cybersecurity challenges for social good.
more » « less
Free, publicly-accessible full text available April 20, 2026
Analysis of COVID-19 Offensive Tweets and Their Targets

https://doi.org/10.1145/3580305.3599773

Liao, Song; Okpala, Ebuka; Cheng, Long; Li, Mingqi; Vishwamitra, Nishant; Hu, Hongxin; Luo, Feng; Costello, Matthew (August 2023, ACM)
Analysis of COVID-19 Offensive Tweets and Their Targets

https://doi.org/10.1145/3580305.3599773

Liao, Song; Okpala, Ebuka; Cheng, Long; Li, Mingqi; Vishwamitra, Nishant; Hu, Hongxin; Luo, Feng; Costello, Matthew John (August 2023, Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2023))

Full Text Available
AAEBERT: Debiasing BERT-based Hate SpeechDetection Models via Adversarial Learning

Okpala, Ebuka; Cheng, Long; Mbwambo, Nicodemus; Luo, Feng (January 2022, International Conference on Machine Learning and Applications)

Full Text Available
COVID-19: A Pandemic of Anti-Asian Cyberhate

https://doi.org/10.33972/jhs.198

Costello, Matthew; Cheng, Long; Luo, Feng; Hu, Hongxin; Liao, Song; Vishwamitra, Nishant; Li, Mingqi; Okpala, Ebuka (October 2021, Journal of Hate Studies)

Full Text Available
COVID-HateBERT: a Pre-trained Language Modelfor COVID-19 related Hate Speech Detection

Mingqi, Li; Liao, Song; Okpala, Ebuka; Tong, Max; Costello, Matthew; Cheng, Long; Hu, Hongxin; Luo, Feng (January 2021, International Conference on Machine Learning and Applications)

Full Text Available

Search for: All records